225 research outputs found

    Non-response biases in surveys of schoolchildren: the case of the English Programme for International Student Assessment (PISA) samples

    Get PDF
    We analyse response patterns to an important survey of schoolchildren, exploiting rich auxiliary information on respondents' and non-respondents' cognitive ability that is correlated both with response and the learning achievement that the survey aims to measure. The survey is the Programme for International Student Assessment (PISA), which sets response thresholds in an attempt to control the quality of data. We analyse the case of England for 2000, when response rates were deemed sufficiently high by the organizers of the survey to publish the results, and 2003, when response rates were a little lower and deemed of sufficient concern for the results not to be published. We construct weights that account for the pattern of non-response by using two methods: propensity scores and the generalized regression estimator. There is clear evidence of biases, but there is no indication that the slightly higher response rates in 2000 were associated with higher quality data. This underlines the danger of using response rate thresholds as a guide to quality of data

    Comments on the Rao and Fuller (2017) paper

    Get PDF
    This note by Chris Skinner presents a discussion of the paper “Sample survey theory and methods: Past, present, and future directions” where J.N.K. Rao and Wayne A. Fuller share their views regarding the developments in sample survey theory and methods covering the past 100 year

    Special issue on the theory and practice of differential privacy

    Get PDF
    This special issue presents papers based on contributions to the first international workshop on the “Theory and Practice of Differential Privacy” (TPDP) held in London, UK, 18 April 2015, as part of the European joint conference on Theory And Practice of Software (ETAPS). Differential privacy is a mathematically rigorous definition of the privacy protection provided by a data release mechanism: it offers a strong guaranteed bound on what can be learned about a user as a result of participating in a differentially private data analysis. Researchers in differential privacy come from several areas of computer science, including algorithms, programming languages, security, databases and machine learning, as well as from several areas of statistics and data analysis. The workshop was intended to be an occasion for researchers from these different research areas to discuss the recent developments in the theory and practice of differential privacy. The program of the workshop included 10 contributed talks, 1 invited speaker and 1 joint invited speaker with the workshop “Hot Issues in Security Principles and Trust” (HotSpot 2016). Participants at the workshop were invited to submit papers to this special issue. Six papers were accepted, most of which directly reflect talks presented at the workshop

    Introduction to the design and analysis of complex survey data

    Get PDF
    We give a brief overview of common sampling designs used in a survey setting, and introduce the principal inferential paradigms under which data from complex surveys may be analyzed. In particular, we distinguish between design-based, model-based and model-assisted approaches. Simple examples highlight the key differences between the approaches. We discuss the interplay between inferential approaches and targets of inference and the important issue of variance estimation

    Estimation of dyadic characteristics of family networks using sample survey data

    Get PDF
    We consider the use of sample survey data to estimate dyadic characteristics of family networks, with an application to noncoresident parent–child dyads. We suppose that survey respondents report either from a parent or child perspective about a dyad, depending on their membership of the dyad. We construct separate estimators of common dyadic characteristics using data from both a parent and a child perspective and show how comparisons of these estimators can shed light on data quality issues including differential missingness and reporting error. In our application we find that a simple missingness model explains some striking patterns of discrepancies between the estimators and consider the use of poststratification and a related redefinition of count variables to adjust for these discrepancies. We also develop approaches to combining the separate estimators efficiently to estimate means and frequency distributions within subpopulations

    Imputation under informative sampling

    Get PDF
    Imputed values in surveys are often generated under the assumption that the sampling mechanism is non-informative (or ignorable) and the study variable is missing at random (MAR). When the sampling design is informative, the assumption of MAR in the population does not necessarily imply MAR in the sample. In this case, the classical method of imputation using a model fitted to the sample data does not in general lead to unbiased estimation. To overcome this problem, we consider alternative approaches to imputation assuming MAR in the population. We compare the alternative imputation procedures through simulation and an application to estimation of mean erosion using data from the Conservation Effects Assessment Project

    Analysis of categorical data for complex surveys

    Get PDF
    This paper reviews methods for handling complex sampling schemes when analysing categorical survey data. It is generally assumed that the complex sampling scheme does not affect the specification of the parameters of interest, only the methodology for making inference about these parameters. The organisation of the paper is loosely chronological. Contingency table data is emphasized first before moving on to the analysis of unit-level data. Weighted least squares methods, introduced in the mid 1970s along with methods for two-way tables, receive early attention. They are followed by more general methods based on maximum likelihood, particularly pseudo maximum likelihood estimation. Point estimation methods typically involve the use of survey weights in some way. Variance estimation methods are described in broad terms. There is a particular emphasis on methods of testing. The main modelling methods considered are log-linear models, logit models, generalized linear models and latent variable models. There is no coverage of multilevel models

    Hyper-resolution mapping of regional storm surge and tide flooding: comparison of static and dynamic models

    Get PDF
    Storm tide (combination of storm surge and the astronomical tide) flooding is a natural hazard with significant global social and economic consequences. For this reason, government agencies and stakeholders need storm tide flood maps to determine population and infrastructure at risk to present and future levels of inundation. Computer models of varying complexity are able to produce regional-scale storm tide flood maps and current model types are either static or dynamic in their implementation. Static models of storm tide utilize storm tide heights to inundate locations hydrologically connected to the coast, whilst dynamic models simulate physical processes that cause flooding. Static models have been used in regional-scale storm tide flood impact assessments, but model limitations and coarse spatial resolutions contribute to uncertain impact estimates. Dynamic models are better at estimating flooding and impact but are computationally expensive. In this study we have developed a dynamic reduced-complexity model of storm tide flooding that is computationally efficient and is applied at hyper-resolutions (<100 m cell size) over regional scales. We test the performance of this dynamic reduced-complexity model and a separate static model at three test sites where storm tide observational data are available. Additionally, we perform a flood impact assessment at each site using the dynamic reduced-complexity and static model outputs. Our results show that static models can overestimate observed flood areas up to 204 % and estimate more than twice the number of people, infrastructure, and agricultural land affected by flooding. Overall we find that that a reduced-complexity dynamic model of storm tide provides more conservative estimates of coastal flooding and impact

    The Structure of Liquid and Amorphous Hafnia.

    Get PDF
    Understanding the atomic structure of amorphous solids is important in predicting and tuning their macroscopic behavior. Here, we use a combination of high-energy X-ray diffraction, neutron diffraction, and molecular dynamics simulations to benchmark the atomic interactions in the high temperature stable liquid and low-density amorphous solid states of hafnia. The diffraction results reveal an average Hf-O coordination number of ~7 exists in both the liquid and amorphous nanoparticle forms studied. The measured pair distribution functions are compared to those generated from several simulation models in the literature. We have also performed ab initio and classical molecular dynamics simulations that show density has a strong effect on the polyhedral connectivity. The liquid shows a broad distribution of Hf-Hf interactions, while the formation of low-density amorphous nanoclusters can reproduce the sharp split peak in the Hf-Hf partial pair distribution function observed in experiment. The agglomeration of amorphous nanoparticles condensed from the gas phase is associated with the formation of both edge-sharing and corner-sharing HfO6,7 polyhedra resembling that observed in the monoclinic phase

    Using binary paradata to correct for measurement error in survey data analysis

    Get PDF
    Paradata refers here to data at unit level on an observed auxiliary variable, not usually of direct scientific interest, which may be informative about the quality of the survey data for the unit. There is increasing interest among survey researchers in how to use such data. Its use to reduce bias from nonresponse has received more attention so far than its use to correct for measurement error. This article considers the latter with a focus on binary paradata indicating the presence of measurement error. A motivating application concerns inference about a regression model, where earnings is a covariate measured with error and whether a respondent refers to pay records is the paradata variable. We specify a parametric model allowing for either normally or t-distributed measurement errors and discuss the assumptions required to identify the regression coefficients. We propose two estimation approaches that take account of complex survey designs: pseudo-maximum likelihood estimation and parametric fractional imputation. These approaches are assessed in a simulation study and are applied to a regression of a measure of deprivation given earnings and other covariates using British Household Panel Survey data. It is found that the proposed approach to correcting for measurement error reduces bias and improves on the precision of a simple approach based on accurate observations. We outline briefly possible extensions to uses of this approach at earlier stages in the survey process. Supplemental materials are available online
    • …
    corecore